Self-Timed  FIFO: 


An  Exercise  in  Compiling  Programs  into  VLSI  Circuits 


Alain  J.  Martin 


Computer  Science  Department 
California  institute  of  Technology 

52 1 1  :TR  :  86 


The  research  described  in  this  paper  was  sponsored  by 
the  DEfense  Advanced  Research  Projects  Agency,  ARPA  Order  No.  3771, 
and  monitored  by  the  Office  of  Naval  Research 
under  contract  number  N00014-79-C-0597 

©  California  Institute  of  Technology,  1986 
published  in 

IFIP  WC  10.2  International  Working  Conference 
on  "From  HDL  Descriptions  to  Guaranteed  Correct  Circuit  Designs", 
Grenoble,  France  9-11  September  1986.  D.  Borrione  (ed) 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

1986 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-1986  to  00-00-1986 

4.  TITLE  AND  SUBTITLE 

5a.  CONTRACT  NUMBER 

Self-Timed  FIFO: 
Circuits 

An  Exercise  in  Compiling  Programs  into  VLSI 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROIECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Defense  Advanced  Research  Projects  Agency, 3701  North  Fairfax 

Drive, Arlington, VA, 22203-1714 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

see  report 

15.  SUBIECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 
ABSTRACT 

18.  NUMBER 
OF  PAGES 

23 

19a.  NAME  OF 
RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


.1 


Self-timed  FIFO: 

An  Exercise  in  Compiling  Programs  into  VLSI  Circuits 


Alain  J.  Martin 


Computer  Science  Department 
California  Institute  of  Technology 


5211  :TR:86 


SELF-TIMED  FIFO:  AN  EXERCISE  IN  COMPILING 
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A  method  for  compiling  a  high-level  description  of  a  computation 
(a  set  of  communicating  processes)  into  a  self-timed  VLSI  circuit 
is  explained  with  an  example:  the  construction  of  a  self-timed 
FIFO  element.  The  method  essentially  relies  on  the  four-phase 
handshaking  expansion  of  the  communication  actions.  The  pro¬ 
gram  of  each  process  is  compiled  into  a  set  of  “production  rules” 
from  which  all  explicit  sequencing  has  been  removed.  By  match¬ 
ing  the  production  rules  to  those  describing  the  semantics  of  the 
VLSI-operators  (and-gate,  or-gate ,  C-element,  arbiter ,  etc.),  the 
programs  are  identified  with  networks  of  operators.  We  show  how 
the  different  heuristics  that  the  method  allows  lead  to  different  cir¬ 
cuits.  In  particular,  the  example  illustrates  the  trade-offs  between 
simplicity  and  efficiency  of  the  circuits. 


1.  INTRODUCTION 

We  have  developed  a  method  for  “compiling”  a  high-level  description  of  a 
computation  (a  set  of  communicating  processes)  into  a  self-timed  VLSI  cir¬ 
cuit.  Self-timed  [8]  (or  delay-insensitive  [9])  circuits  are  sequential  circuits 
in  which  the  sequencing  is  enforced  entirely  by  communication  mechanisms. 
No  clock  signals  are  used,  and  no  assumption  is  made  on  the  delays  in  op¬ 
erators  and  wires  except  that  the  delays  are  finite.  The  advantages  of 
self-time  circuits  are  many:  First,  with  the  increasing  size  of  circuits,  it 
becomes  more  and  more  difficult  to  distribute  safely  a  clock  signal  across  a 
chip.  Second,  clocked  circuits  rely  on  worst-case  assumptions  on  the  timing 
behavior  of  the  components,  which  decreases  their  performances.  Third, 
with  no  restriction  on  the  length  of  wires,  layout  is  facilitated. 

In  the  method  we  propose,  the  computation  is  initially  described  as  a  set  of 
communicating  processes  in  the  notation  of  [3],  which  is  somewhat  similar 
to  C.A.R.  Hoare’s  CSP  [2].  This  first  description  is  the  reference  solution, 
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which  has  to  be  proved  correct.  The  program  is  then  compiled  into  a  delay- 
insensitive  circuit  by  applying  a  series  of  semantics-preserving  transforma¬ 
tions.  Hence  the  circuit  obtained  is  correct  by  construction:  all  semantic 
properties  that  can  be  proved  of  the  program  hold  for  the  circuit  as  well. 

The  compilation  is  systematic  and  essentially  relies  on  the  four-phase  hand¬ 
shaking  implementation  of  communication  actions.  The  program  of  each 
process  is  compiled  into  a  set  of  “production  rules”  from  which  all  ex¬ 
plicit  sequencing  has  been  removed.  By  matching  these  production  rules 
to  those  describing  the  semantics  of  the  VLSI-operators  (and-gate,  or-gate, 
C-element,  arbiter,  etc.  ),  the  programs  are  identified  with  networks  of 
operators,  i.e.,  self-timed  circuits. 

The  method  has  been  applied  to  a  whole  spectrum  of  problems,  some  of 
them  quite  difficult,  like  distributed  mutual  exclusion  [4]  and  fair  arbitra¬ 
tion  [5].  The  results  are  far  beyond  our  expectations.  For  most  circuits, 
especially  complex  ones,  the  compiled  circuits  are  superior  to  their  “hand- 
designed”  counterparts,  i.e.  they  are  simpler  and  use  fewer  operators,  in 
particular  state-holding  operators.  A  general  description  of  the  method  can 
be  found  in  [4]  and  [6]. 

As  an  exercise  in  applying  the  method,  we  will  construct  circuits  corre¬ 
sponding  to  a  self-timed  FIFO-element.  We  will  see  how  the  different  alter¬ 
natives  that  the  method  allows  lead  to  different  solutions.  We  first  present 
the  program  notation  and  the  VLSI  operators  that  constitute  the  “object 
code”.  We  then  describe  the  four  steps  of  the  compilation  and  illustrate 
the  method  by  constructing  different  versions  of  the  FIFO-element. 

2.  THE  PROGRAM  NOTATION 
Sequential  part 

For  the  sequential  part  of  the  algorithm,  we  use  a  subset  of  Edsger  W.  Dijk- 
stra’s  guarded  command  language  [1],  with  a  slightly  different  syntax.  We 
will  give  only  a  very  informal  definition  of  the  semantics  of  the  constructs 
used. 

i)  stands  for  6  :=  true,  b[  stands  for  b  :=  false. 

ii)  The  execution  of  the  selection  command  [Gi  — ►  S\  |  . . .  |  Gn  — ►  5n] , 
where  Gi  through  Gn  are  Boolean  expressions,  and  Si  through  Sn  are 
program  parts,  (G,  is  called  a  “guard”,  and  Gi  Si  a  “guarded  com¬ 
mand”)  amounts  to  the  execution  of  an  arbitrary  Si  for  which  G,  holds. 
If  -<(Gi  V  ...  V  Gn)  holds,  the  execution  of  the  command  is  suspended 
until  (Gi  V  ...  V  Gn)  holds. 
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iii)  Besides  the  usual  sequencing  operator — the  semi-colon — ,  we  introduce 
a  weaker  sequencing  operator — the  comma — .  For  atomic  actions  x  and 
y,  “x,p”  stands  for  the  execution  of  x  and  y  in  any  order. 

iv)  [G]  where  G  is  a  Boolean,  stands  for  [G  — ►  skip],  and  thus  for  “wait 
until  G  holds”.  (Hence,  “[G];  S”and  [G  — ►  5]  are  equivalent.) 

v)  *[S]  stands  for  “repeat  S  forever”. 

vi)  From  ii)  and  iii) ,  the  operational  description  of  the  statement 

*[[Gi  — ►  5i  |  . . .  |  Gn  — *•  Sn]]  is  “repeat  forever:  wait  until  some  Gi 
holds;  execute  an  S%  for  which  Gi  holds”. 

Communicating  processes 

A  concurrent  computation  is  described  as  a  set  of  processes  composed  by  the 
usual  parallel  composition  operator  j|.  Processes  communicate  with  each 
other  by  communication  actions  on  channels;  they  do  not  share  variables. 
When  no  messages  are  transmitted,  communication  on  a  channel  is  reduced 
to  synchronization  signals.  The  name  of  the  channel  is  then  sufficient  for 
identifying  a  communication  action. 

If  two  processes  pi  and  p2  share  a  channel  named  X  in  pi  and  F  in  p2, 
at  any  time  the  number  of  completed  X-actions  in  pi  equals  the  number 
of  completed  F-actions  in  p2.  In  other  words,  the  completion  of  the  n- 
th  X-action  “coincides”  with  the  completion  of  the  n- th  F-action.  If, 
for  example,  pi  reaches  the  n-th  X-action  before  p2  reaches  the  n-th  F  - 
action,  the  completion  of  X  is  suspended  until  p2  reaches  F.  The  X-action 
is  then  said  to  be  pending.  When  thereafter  p2  reaches  F,  both  X  and 
F  are  completed.  The  predicate  “X  is  pending”  is  denoted  qX.  If,  for 
an  arbitrary  command  A,  c A  denotes  the  number  of  completed  A-actions, 
the  semantics  of  a  pair  (X,F)  of  communication  commands  is  expressed 
by  the  two  axioms: 


cX  =  cF  (Al) 

-iqX  V  iqF.  (A2) 


Probe 

Instead  of  the  usual  selection  mechanism  by  which  a  set  of  pending  commu¬ 
nication  actions  can  be  selected  for  execution,  we  provide  a  general  Boolean 
command  on  channels,  called  the  probe.  In  the  original  definition  given  in 
[3],  the  probe  command  X  in  process  pi  has  the  same  value  as  qF.  Here, 
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we  use  a  weaker  definition,  namely: 


X^qY 

q Y  =»  oX, 

where  oP  means  P  Ao/ds  eventually.  For  example,  a  construct  of  the  form 

[X  ->  X  \Z  ^  Z] 


can  be  informally  interpreted  as  “if  a  communication  action  is  pending  at 
the  other  end  of  channel  X,  fire  X;  if  a  communication  action  is  pending 
at  the  other  end  of  channel  Z ,  fire  Z” . 


3.  THE  “OBJECT  CODE” 

The  set  of  operators  with  which  we  build  circuits  is  not  unique.  In  this 
introduction,  we  will  use  the  simple  set  consisting  of  and ,  or,  exclusive-or, 
C-element,  enabled  C-element,  wire,  and  fork.  Each  operator  is  described 
by  a  set  of  production  rules.  A  production  rule  is  similar  to  a  guarded 
command,  and  we  shall  therefore  use  a  similar  syntax.  There  are,  however, 
important  semantic  differences.  Consider  the  production  rule  G  v-*  S: 

•  S  is  either  a  simple  assignment  or  of  the  form  “si,  s2”  where  si  and  s2 
are  each  a  simple  assignment. 

•  If  G  holds,  the  correct  execution  of  S  is  guaranteed  only  if  G  remains 
invariantly  true  until  the  completion  of  S.  We  say  that  G  must  be 
stable. 

•  Unlike  the  guarded  commands  of  a  selection  or  a  repetition,  the  mutual 
exclusion  among  the  different  production  rules  of  a  set  is  not  guaranteed 
automatically.  It  has  to  be  enforced  by  the  semantics  of  the  program. 

•  If  stability  of  the  guards  and  mutual  exclusion  among  guards  are  guar¬ 
anteed,  the  production  rule  set  PRS  is  semantically  equivalent  to  the 
repetition  *[[GC5]],  where  GCS  is  the  guarded  command  set  syntacti¬ 
cally  identical  to  PRS .  The  descriptions  of  the  operators  used  in  this 
paper  in  terms  of  their  production  rules  and  their  logic  symbols  are  as 
follows. 
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The  C-element: 


{x,y}  C_z  =  x  A yy-+  z\ 

-i2  A  ->y  t— *•  z  J. 


The  enabled  C-element: 

(2,  y;  u)  eC_  z  =  iA2/Auk2| 

-i2  A  ->y  A  u  1— >  z  j 


The  “and”: 


(x,y)  A  2=  2  A  y  h-+  z  f 

-12  V  ->y  H-f  z  | 


The  “or”: 


(x,y)Vz  =  xVy>->zT 

->x  A  -iy  i—>  z  l 


The  “exclusive-or” : 


(2, y)  X°r  z  =  a  ^ 

2  =  y  h-+  2  j 


The  wire: 


x  w  y  =  2  ^  y  | 
-'X  (-►  y  J. 


The  fork: 


2  /  (y,^)  =  2  •->  y  |,2t 
~-2h-»y  j,2j 


Any  input  or  output  variable  of  an  operator  may  be  negated.  In  particular, 
a  wire  with  its  input  or  its  output  negated — but  not  both — is  an  inverter. 
A  negated  input  or  output  is  represented  in  the  figures  by  a  small  circle  on 
the  corresponding  line. 

4.  THE  COMPILATION  METHOD 
Process  Decomposition 


The  first  step  of  the  compilation,  called  “process  decomposition”,  consists  in 
replacing  a  process  by  several  semantically  equivalent  processes.  The  pur¬ 
pose  of  the  decomposition  is  to  obtain  a  process  representation  of  the  pro¬ 
gram  in  which  the  right-hand  side  of  each  guarded  command  is  a  straight- 
line  program,  i.e.,  consists  only  of  simple  assignments  and  communication 
commands,  composed  by  semi-colons  and  commas. 
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Decomposition  rule:  A  process  P  containing  an  arbitrary  program  part 
S  is  semantically  equivalent  to  two  processes  Pi  and  P 2,  where  PI  is 
derived  from  P  by  replacing  S  with  a  communication  action  C  on  the  newly 
introduced  channel  [C,D)  between  PI  and  P2,  and  P2  =  *[[£)  — *  5;D]]. 

For  example,  a  process  P  of  the  form 

P=  *[50;  51;  52] 

can  be  replaced  by  the  sematically  equivalent  program  (P1||P2),  with 

PI  =  *[50;  C;  52] 

P2  =  *[[Z?  — »  51;  £>]]. 

Observe  that  the  above  decomposition  does  not  introduce  concurrency.  Al¬ 
though  PI  and  P2  are  potentially  concurrent  processes,  they  are  never 
active  concurrently:  P2  is  activated  from  PI,  much  as  a  procedure  or  a 
coroutine  would  be.  The  only  purpose  of  this  transformation  is  to  simplify 
the  structure  of  each  command.  Process  decomposition  is  applied  repeat¬ 
edly  until  the  right-hand  side  of  each  guarded  command  is  a  straight-line 
program. 

Handshaking  Expansion 

The  implementation  of  communication,  called  “handshaking  expansion”, 
replaces  each  channel  by  a  pair  of  wire-operators  and  each  communication 
action  by  its  implementation.  Channel  ( X,Y )  is  implemented  by  the  two 
wires  ( xo  w  yi )  and  ( yo  w  xi ). 

If  X  belongs  to  process  pi  and  Y  to  process  p2,  xo  and  xi  belong  to  pi, 
and  yo  and  yi  belong  to  p2.  Initially,  xo,  xi,  yo,  and  yi — which  we  will 
call  the  “handshaking  variables  of  (X,Y)” — are  false.  Assume  that  the 
program  has  been  proved  to  be  deadlock-free  and  that  we  can  identify  a 
pair  of  matching  actions  X  and  Y  in  pi  and  p2  respectively.  We  replace 
X  and  Y  by  the  sequences  Ux  and  Uy  respectively,  with: 


Ux  =  xo  t;  [xi] 

Uy  =  [y<];  yo  T  . 

Unfortunately,  when  the  communication  terminates,  all  handshaking  vari¬ 
ables  are  true.  Hence,  we  cannot  implement  the  next  communication  with 
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Ux  and  Uy.  However,  the  complementary  implementation  can  be  used  for 
the  next  matching  pair,  namely: 

Dx  =  xo  l;  [-ixi] 

Dy  =  J-iyt];  yo[. 

The  solution  consisting  in  alternating  Ux  and  Dx  as  an  implementation  of 
X,  and  Uy  and  Dy  as  an  implementation  of  Y  is  essentially  the  so-called 
“two-phase  handshaking”,  or  “two-cycle  signaling”.  But  it  is  in  general  not 
possible  to  determine  syntactically  which  X-  or  Y -actions  are  following 
each  other  in  an  execution.  In  such  cases,  two-phase  handshaking  imple¬ 
mentations  require  testing  the  current  value  of  the  variables.  In  this  paper, 
we  shall  use  a  simpler  but  less  efficient  solution  known  as  “four-phase  hand¬ 
shaking”,  or  “four-cycle  signaling”. 

In  a  four-phase  handshaking  protocol,  all  X-actions  are  implemented  as 
“ UX;DX ”  and  all  T-actions  as  “ Uy;Dyn .  Observe  that  the  D- parts  in  X 
and  Y  introduce  an  extra  communication  between  the  two  processes  whose 
only  purpose  is  to  reset  all  variables  to  false.  The  synchronization  intro¬ 
duced  by  this  extra  communication  is  unnoticeable  since  the  immediately 
preceding  communication  implemented  by  Ux  and  Uy  sees  to  it  that  both 
processes  reach  a  matching  Dx  and  Dy  “at  the  same  time” . 

Both  protocols  have  the  property  that  for  a  matching  pair  (X,  Y )  of  actions, 
the  implementation  is  not  symmetrical  in  X  and  Y.  One  action  is  called 
active  and  the  other  one  passive.  The  four-phase  implementation  with  X 
active  and  Y  passive  is: 


X  =  xo  f;  \xi\-,  zo|;  [-izt]  (1) 

Y  =  [y*];  yo|;  Hf*l»  V°l •  (2) 

When  no  action  of  a  matching  pair  is  probed,  the  choice  of  which  one  should 
be  active  and  which  one  passive  is  arbitrary,  but  a  choice  has  to  be  made. 
The  choice  can  be  important  for  the  composition  of  identical  circuits.  A 
simple  rule  is  that  for  a  given  channel  (X,  Y),  all  actions  at  one  side  are 
active  and  all  actions  at  the  other  side  passive.  If  X  is  used,  all  X-actions 
are  passive — with  the  obvious  restriction  that  Y  cannot  be  used  in  the  same 
program. 

The  implementation  of  the  probe  is  simply: 

X  =  xi 
Y  =  yi. 


(3) 
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Given  our  definition  of  suspension,  the  proof  that  this  implementation  of 
the  probe  fulfils  the  definition  of  Section  2  is  straightforward  and  is  omitted. 

A  probed  communication  action  X  X  is  implemented: 

xi  xo  [~i  xi];  xo  j. 

Basic  properties 

The  following  properties  of  the  handshaking  protocol  play  an  important 
role  in  the  compilation  method. 

Property  1:  For  the  pair  of  wires  ( xowyi )  and  {yo  w  xi) ,  used  together 
as  in  (1)  and  (2),  and  all  variables  false  initially ,  the  following  sequence  of 
transitions  is  guaranteed  to  occur  if  the  system  is  deadlock-free: 


*[xo  f;  yi  f;  yo  f;  xi  T;  xo  j;  yi[\  yo  j;  xi|].  (4) 

Hence,  the  following  postconditions  hold: 

xo f {oxi } 

xo  J.{o-ixi}  (5) 

yo\{o-yyi } 

In  other  words,  if  the  system  is  deadlock-free,  the  handshaking  protocol 
guarantees  that  once  xo|  has  been  completed,  xi  holds  eventually.  And 
similarly  for  xo  !  and  yo  "f . 

Property  2:  Consider  the  handshaking  expansion  of  a  program  p  accord¬ 
ing  to  (1),  (2),  and  (3).  Provided  that  the  cyclic  order  of  the  four  hand¬ 
shaking  actions  of  a  communication  command  is  respected,  the  last  two 
actions  of  this  command — the  two  actions  of  Dx  or  Dy — can  be  inserted 
at  any  place  in  p  without  invalidating  the  semantics  of  the  communication 
involved.  However,  modifying  the  order  of  these  two  actions  relatively  to 
other  actions  of  p  may  introduce  deadlock. 

Property  2  is  a  direct  consequence  of  the  way  in  which  we  have  introduced 
the  sequences  Dx  and  Dy.  In  this  paper,  we  will  ignore  the  deadlock  issue 
when  we  re-order  handshaking  actions. 

5.  FOUR-PHASE  FIFO-ELEMENT 

A  FIFO-element  is  a  process — say,  p — communicating  with  its  left-hand 
neighbor  by  channel  L  and  with  its  right-hand  neighbor  by  channel  R. 
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For  instance,  p  receives  a  value  from  its  left-hand  neighbor  by  the  input 
command  L?x  and  sends  the  received  value  to  its  right-hand  neighbor  by 
the  output  action  R\x,  as  follows: 


p  =  *[L?x]  i2!x]. 

For  the  time  being,  let  us  ignore  the  transmission  of  values  over  the  channels 
and  let  us  concentrate  on  implementing  the  simpler  program: 


p  =  *[L\R]. 

The  program  to  be  compiled  is  so  simple  that,  a  priori,  we  see  no  reason  for 
using  process  decomposition.  (We  will  see  later  that,  even  in  this  simple 
case,  process  decomposition  can  be  useful.)  We  choose  to  implement  com¬ 
munication  commands  L  and  R  by  four-phase  handshaking,  and,  in  view 
of  our  intention  to  compose  several  of  these  elements,  we  choose  L  to  be 
passive  and  R  active.  This  leads  to  the  handshaking  expansion  of  p: 


*[[%  M;  H*1;  M;  M;  ro j;  [->ri]]. 

Because  of  the  cyclic  nature  of  the  program,  and  because  all  variables  are 
initialized  to  false,  the  above  program  is  equivalent  to 

*[hr*];  M;  fot;  H*];  l°U  rot ;  M;  ro|].  (6) 

6.  PRODUCTION-RULE  EXPANSION 

The  next  step  is  to  compile  the  handshaking  expansion  of  the  program 
into  a  set  of  production  rules  from  which  all  explicit  sequencing  has  been 
removed.  By  matching  these  production  rules  to  the  ones  describing  the 
semantics  of  operators,  the  programs  can  be  identified  with  networks  of 
operators.  We  use  the  compilation  of  p  to  illustrate  the  different  steps  of 
the  expansion. 

We  start  with  the  production-rule  set  syntactically  derived  from  the  pro¬ 
gram.  In  the  case  of  p,  it  is  the  set  derived  from  (6),  namely: 

-iri  A  li  i —►/of 
->li  i — ►  /a 
-i lo  1— *  ro  | 
ri  i-+  ro  | . 
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The  execution  of  a  production  rule  is  called  effective  if  it  changes  the  value 
of  a  variable.  Otherwise,  it  is  called  vacuous.  We  ignore  vacuous  executions 
of  production  rules.  For  each  guarded  command  of  the  program,  the  pro¬ 
duction  rule  set  representation  is  semantically  equivalent  to  the  program 
representation  if  and  only  if  the  order  of  execution  of  effective  produc¬ 
tion  rules  is  the  same  as  the  order  of  the  corresponding  transitions  in  the 
program — we  call  it  the  program  order.  (As  a  clue  to  the  reader  we  list  the 
production  rules  of  a  set  in  program  order.) 

In  general,  we  have  to  strengthen  the  guards  of  some  rules  to  enforce  ex¬ 
ecution  in  program  order.  This  is  the  case  in  our  example:  Since  -do 
holds  initially,  the  third  production  rule  can  be  executed  first  if  we  don’t 
strengthen  the  guards.  Because  all  handshaking  variables  of  L  are  back  to 
false  when  L  is  completed,  we  cannot  find  a  guard  for  the  transition  ro|. 
(Hence,  the  transitions  following  a  semi-colon  that  can  be  identified  with  a 
semi-colon  of  the  original  program  are  likely  to  be  difficult  to  deal  with.) 

One  technique  for  solving  this  problem  is  to  use  the  possibility  of  shuffling 
any  of  the  last  two  actions  of  the  four-phase  expansion  of  a  communication 
command  as  a  consequence  of  Property  2.  Of  course,  the  shuffle  must 
maintain  the  cyclic  order  of  the  four  actions.  The  other  technique  consists 
in  introducing  a  state  variable  to  identify  uniquely  the  state  in  which  a 
certain  transition  is  to  take  place. 

In  this  exercise,  we  show  that  the  different  circuits  for  the  four-phase  FIFO 
correspond  to  the  different  ways  to  apply  those  two  techniques.  We  first 
apply  different  shufflings  of  the  handshaking  actions.  We  will  observe  that 
more  shuffling  leads  to  simpler  circuits  and  less  shuffling  to  a  “quicker  return 
linkage” .  We  start  with  the  maximum  shuffling  and  end  with  the  quickest 
return  linkage  (no  shuffling),  which  corresponds  to  an  implementation  with 
a  state  variable  . 

7.  MAXIMUM  SHUFFLING 


The  maximum  shuffling  that  still  maintains  the  order  between  the  first  half 
of  the  handshaking  of  L  and  the  first  half  of  the  handshaking  of  R  is: 

*[[-> ri];  [it];  M;  rot;  H;  H*];  lo |;  ro|].  (7) 

This  leads  to  the  production-rule  expansion: 

->ri  A  li  *-*■  lo]  (8) 

lo  i->  ro|  {ort  A  o-^li}  (9) 
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ri  A  Hi  \-*lol 

Ho  i— ►  ro  J,  {o-ir*}. 


(10) 

(11) 


Using  the  postconditions  indicated  between  braces — these  conditions  rely 
on  (5) — it  is  easy  to  verify  that  the  production  rules  of  the  set  are  exe¬ 
cuted  in  program  order.  Hence  the  execution  of  the  production-rule  set  is 
equivalent  to  the  execution  of  (7). 

The  last  step  of  the  compilation,  called  operator  reduction ,  consists  in  iden¬ 
tifying  production  rules  of  the  program  with  production  rules  defining  the 
operators.  We  group  the  production  rules  that  modify  the  same  variable 
and  we  try  to  identify  them  with  one  or  more  operators.  The  production 
rules  (8)  and  (10)  are  implemented  as  (~<ri,li)  C_  lo.  The  production  rules 
(9)  and  (11)  are  implemented  as  lo  w  ro.  The  circuit  is  represented  in 
Figure  l.a.  with  the  alternative  representation  of  Figure  l.b. 


8.  LESS  SHUFFLING 

Here,  we  shuffle  only  lo[  in  the  original  handshaking  expansion.  We  get: 

iot;  H*j;  rot;  [r*];  lo[\  roj]. 

The  production  rule  expansion  gives: 

-i ri  A  li  t->  lo] 
lo  A  Hi  ro| 
ri  (-+•  lo  J, 

Ho  roj.  . 

The  two  production  rules  that  modify  lo  cannot  be  immediately  identified 
with  an  operator,  and  the  same  for  the  two  production  rules  that  modify 
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ro.  In  such  a  case,  we  perform  on  the  group  a  last  transformation  called 
symmetrization :  we  transform  the  guards  of  the  production  rules — again 
under  invariance  of  the  semantics — so  as  to  make  them  “look  like”  the 
guards  of  operators.  If  the  guard  contains  too  many  variables,  this  step  may 
also  involve  decomposing  a  production  rule  into  several  ones  by  introducing 
additional  variables  called  padding  variables. 

For  the  guards  of  lo,  we  observe  that  we  can  strengthen  the  guard  ri  of  lo\, 
as  -i li  A  ri  since  ~di  holds  as  a  precondition  of  the  production  rule.  For  the 
guards  of  ro,  symmetrization  requires  to  weaken  the  guard  -do  of  roj.  as 
li  V  -i lo .  In  this  case,  since  we  have  weakened  the  guard,  we  have  to  check 
that  we  have  not  enlarged  the  set  of  states  in  which  the  production  rule 
can  be  effectively  executed.  Since  to  holds  when  li  holds,  no  such  state 
has  been  added.  Hence  the  transformation  is  safe.  After  symmetrization, 
we  get  the  equivalent  set: 


-i ri  A  li  Jof 

(12) 

lo  A  ->li  i— ►  ro| 

(13) 

-i li  A  ri  t  lo  | 

(14) 

li  V  ~>lo  i — ►  ro  J.  . 

(15) 

Now,  the  operator  reduction  is  straightforward: 

(12) &(14)  :  (t*,  li)  C_  lo 

(13) &(15)  :  (lo,-<li)  A  ro. 

which  gives  the  circuit  of  Figure  2. 


9.  LESSER  SHUFFLING 

In  this  case  we  postpone  the  sequence  lof  only  until  after  ro|  and  [-»r»] 
until  after  Zoj .  Again  these  shufflings  maintain  the  cyclic  order  among  the 
handshaking  actions  of  L  and  among  the  handshaking  actions  of  R.  We 
get  the  program: 
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*[[**];  M;  hr*];  rot;  H*1;  M;  H;  roj.]. 
The  production  rule  expansion  is  straightforward: 

to  A  li  l o  1 

lo  A  ti  t  rof 
-<Zi  A  ro  t  Zo  j 
ri  A  ->Zo  t  ro  J.  . 

Which  immediately  leads  to  the  operators: 

(-i ro,li)  C_  lo 
(lo,~>ri)  C_  ro. 

The  circuit  is  represented  in  Figure  3. 


-Figure  3- 


10.  QUICK-RETURN  LINKAGE 

We  will  now  compile  p  without  shuffling  actions.  We  will  observe  that 
the  compilation  is  more  complicated  than  with  shuffling  but  leads  to  more 
efficient  circuits:  the  ^-handshaking  sequence  is  completed  before  the  R- 
handshaking  sequence  starts.  For  this  reason,  such  an  implementation  is 
sometimes  called  a  “quick- return  linkage”  [8]. 

First  implementation 

(This  solution  has  been  designed  together  with  Huub  Schols,  from  Eind¬ 
hoven  University  of  Technology.)  In  order  to  define  the  precondition  of  rof 
uniquely,  we  now  introduce  a  state  variable  u  as  follows: 

*[[inA/ij;  /of;  [u];  [->/*];  lo J.;  ro|;  [ri];  uj,;  [tx];  ro[]. 

An  additional  problem  here  is  that  the  condition  Ti  A  li  is  not  strong 
enough  as  precondition  of  lo  f:  since  the  implementation  of  L  is  passive,  li 
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can  become  true  after  lo[ ,  i.e.  -iriA/i  may  hold  after  lo[.  We  strengthen 
the  condition  with  and  get  the  correct  production  rule  set: 


-i ri  A  li  Zo| 

(18) 

lo  i->  u| 

(19) 

-i li  A  u±-*lo{ 

(20) 

u  A  ~<lo  i— ►  ro  | 

(21) 

ri  h-*'  u  | 

(22) 

->u  t->  ro  j. . 

(23) 

The  symmetrizations  of  (19)  &;  (22)  and  of  (21)  &;  (23) 

are  straightforward: 

-i ri  A  lo  i—*  u  f 

(19') 

-i lo  A  ri  »-*•  u  J. 

(22') 

o 

w 

1 

o 

r 

< 

(21) 

lo  V  -iu  *-+  ro  J. . 

(23') 

For  the  symmetrization  of  (18)  &  (20),  since  the  guard  of  (18)  contains 
three  variables,  we  introduce  a  padding  variable  y  to  decompose  the  guard: 

-iu  A  li  i— ►  y  t 

(24) 

-ri  A  y  lo  | 

(25) 

u  A  -ili  n ►  y  j 

(26) 

ri  V  ~iy  lo  j  . 

(27) 

(For  the  newly  introduced  variable  y,  we  have  to  check  that  no  effective 
transition  other  than  (24)  and  (26)  is  possible  in  the  production  rule  ex- 

pansion.)  The  operator  reduction  now  gives: 

(19')  &;  (22')  :  (-1  ri,lo)  C_  u 

(21)  &  (23')  :  ( u,-ilo )  A  ro 

(24)  &;  (26)  :  (li,  -m)  C_  y 

(25)  &  (27)  :  (~iri,y)  A  lo 

which  gives  the  circuit  of  Figure  4. 
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Second  implementation 

We  decompose  p  into  two  processes  q  and  t : 

«  =  .[[£)  -.£;£>]], 

where  ( C,D )  is  a  newly  introduced  channel.  According  to  the  process 
decomposition  rule,  ( q  ||  t)  is  equivalent  to  p.  Because  C  matches  D  and 
D  is  probed,  C  has  to  be  implemented  as  active  and  D  as  passive.  It  turns 
out  that  the  implementation  of  q  with  C  and  R  both  active  is  simpler 
than  the  original  one,  and  that  t  is  also  easy  to  implement.  Again  the 
compilation  of  q  without  shuffling  requires  introducing  a  state  variable  u: 

q  =  *[co  t;  [«];  u  f;  [«];  co  j;  b«];  ro  [ri];  u  j;  [-•«];  ro  [--ri]]. 

The  production  rule  expansion  gives: 

-i ri  A  ->u  i — ►  co  1 
-i  ri  A  ci  i— ►  u  f 
ri  V  u  t-*  co  | 
u  A  -'d  i — y  ro  "f 
ri  A  ~'d 

-iu  V  ci  ro  l  . 

The  operator  reduction  gives: 

(~iri,  -iu)  A  co 
(-1  ri,  ci)  C_  u 
(u,-<ci)  A  ro. 

The  circuit  is  shown  in  Figure  5,  in  which  (— «r*,  — •«)  A  co  is  replaced  by 
( ri,u )  V-'CO. 
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-Figure  5- 


The  compilation  of  t  is  straightforward.  The  handshaking  expansion  gives: 

*[[di];  [/»*];  lo  f;  [—»/*] ;  lo  l;  do|;  [-id*];  do[\. 

Since  D  is  an  internal  channel  to  t,  we  can  shuffle  the  sequence  [-> li\\lo\, 
with  respect  to  D  without  changing  the  order  of  L  relative  to  R.  We  get: 

*[[d*];  [/*];  dof;  [-id*];  [—•/«];  lo[\  do[\. 

The  production  rule  expansion  leading  to  the  circuit  of  Figure  6  is: 

di  A  li  *-*•  lo  do  f 
-id*  A  ~'li  lo  do  l . 

di 

-Figure  6- 

The  complete  circuit  of  Figure  7  is  obtained  by  composing  the  circuits  of 
Figure  5  and  Figure  6. 


//  Cl_  _i - rn 


-Figure  7- 
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11.  MESSAGE  PASSING  AND  DOUBLE-RAIL  ENCODING 

Let  us  now  go  back  to  the  original  program  p  =  *[L?x ;  i2!z]  where  x  is 
an  internal  Boolean  variable.  In  order  to  implement  p,  we  duplicate  the 
channels  L  and  R  and  use  channels  LI  and  L2  to  input  the  values  true 
and  false  respectively,  and  channels  R1  and  R2  to  output  the  values  true 
and  false  respectively.  We  get 

pp  =  *[[  LI  — ►  LI;  R1 
|  L2  — *  L2;  R2 
]]. 

with  -'Ll  V  ~>L2  invariantly  true.  Using  the  first  solution,  we  get  the 
handshaking  expansion: 

pp  =  *[[  -irlz  A  Hi  — +  /lot,  riot;  [rl*  A -»Z1*];  Zloj,  rloj. 

|  -ir2z  A  I2i  — »  l2o t,  r2 o|;  [r2i  A  -*l2i]',  Z2o|,  r2o(. 

]]• 

Next,  we  have  to  ensure  mutual  exclusion  between  the  two  guarded  com¬ 
mands  in  order  to  be  able  to  replace  them  by  a  set  of  production  rules. 

Assume  pp  is  inside  the  first  guarded  command.  Since  — <Zli  V  -<l2i  holds 
as  a  consequence  of  -iLl  V  ~^L2,  the  second  guard  is  false  as  long  as  pp 
has  not  completed  Zloj.  Since  rli  holds  until  pp  has  completed  rloj,  the 
second  guard  is  guaranteed  to  remain  false  as  long  as  pp  is  inside  the  first 
guarded  command,  if  we  strengthen  the  second  guard  as: 

-irli  A  ~t2 i  A  I2i. 

And  symmetrically  for  the  first  guard.  We  get: 

pp  =  *[[-rli  A  -r2i  A  lli  — +  Zlo|,  rlo|;  [rli  A  — <Zlz] ;  llol,  rloj 
|  ->r2i  A  -rlt  A  I2i  — ►  Z2of,  r2o|;  [r2i  A  — »Z2*];  Z2o|,  r2o[ 

II- 

The  production  rule  expansion  for  the  first  guarded  command  gives: 


->rli  A  -<r2i  i— *  a  |  (28) 

-laAZli^uf  (29) 

u  !-►  Zlof,rlo |  (30) 

ri\  V  r2z  a  f  (31) 

a  A  —illi  i— ►  w  j  (32) 

t— >■  Zlo  i,rlo  l  (33) 
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The  operator  reduction  gives: 


(28)  &  (31)  :  (rli,r2t)  V  a 

(29)  &  (32)  :  (-.a,Zlz)  C  u 

(30)  II  (33)  :  u  /  (Zlo,rlo) 

The  operator  reduction  of  the  second  guarded  command  is  identical.  The 
final  circuit  is: 


L\i  r\o 


-Figure  8- 


If  we  let  the  two  L-channels  share  wire  lo,  and  the  two  i2-channels  share 
wire  ri,  we  get  the  circuit  of  Figure  9. 


l\i  r\o 


-Figure  9- 


12.  COMPLETE  FIFO-ELEMENT  WITH  “QUICK  RETURN” 

We  use  the  second  “quick  return”  solution  and  decompose  pp  as: 

ql  =  *[Cl;Rl] 
q2  =  *[C2;R2] 
tl  =  *[[  D1  A  Ta  —*■  LI;  2?1]] 

£2  =  *[[D2  AL2  ^  L2;  D2]] 

In  order  to  guarantee  that  the  concurrent  execution  of  ql,  q2,  tl,  and  £2 
is  equivalent  to  the  execution  of  pp,  we  have  to  strengthen  the  guards  of 
the  handshaking  expansions  of  £1  and  £2  so  as  to  enforce  mutual  exclusion 
between  the  executions  of  the  first  and  the  second  guarded  commands  of 
pp.  From  the  handshaking  expansion  of  q  and  £  in  Section  10,  we  observe 
that  when  pp  is  executing  its  first  guarded  command,  IliV  —xHi  holds,  and 
symmetrically  when  pp  is  executing  its  second  guarded  command.  Since 
— >/l«  A  ->l2i  is  guaranteed  by  definition,  it  suffices  to  strengthen  the  guards 
of  £1  and  £2  as  dli  A  Hi  A  d2i  ,  and  d2i  A  I2i  A  dli  ,  respectively. 

Apart  from  this  transformation,  the  rest  of  the  compilation  is  identical  to 
the  compilation  of  q  and  £.  The  only  difference,  caused  by  the  strengthening 
of  the  guards  of  £1  and  £2  is  that  the  production  rules  of  llo  in  £1  have  to 
be  implemented  by  the  enabled  C-element: 

(dli,lli;d2i)  eC  llo 

and  the  production  rules  of  l2o  in  £2  have  to  be  implemented  by  the  enabled 
C-element: 

(d2i,l2i\  dli )  eC  l2o. 

The  complete  circuit  is  shown  in  Figure  10. 

13.  CONCLUSION 

In  this  application  of  the  method,  we  have  shown  how  different  circuits 
for  a  self-timed  FIFO-element  can  be  derived  from  the  different  heuristics 
that  the  method  allows.  The  example  also  deary  illustrates  the  trade-offs 
between  ease  of  compilation  and  simplicity  of  the  circuits  on  the  one  hand, 
and  efficiency  on  the  other  hand. 

We  have  used  only  four-phase  handshaking  in  this  example,  although  two- 
phase  handshaking  is  more  efficient  since  it  uses  only  half  of  the  handshaking 
sequences.  Unfortunately,  two-phase  handshaking  is  more  difficult  to  realize 
because  of  the  necessity  to  record  the  current  value  of  the  handshaking 
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variables,  and  therefore  we  have  first  developed  a  method  based  on  four- 
phase  handshaking.  However,  recent  experiments  with  two-phase  indicate 
that  the  method  can  handle  both  protocols 

The  operators  used  to  construct  the  circuits  are  all  well-known  and  VLSI 
implementations  exist  for  all  of  them.  (The  enabled  C-element  is  the  only 
one  the  implementation  of  which  is  somewhat  difficult.  Fortunately,  it  can 
be  replaced  most  of  the  time  by  an  asymmetric  C-element,  which  is  easier 
to  implement.  This  is  the  case  for  the  circuit  of  Figure  10.) 

The  most  important  assumption  on  which  the  correct  functioning  of  the 
circuits  depends  is  the  stability  assumption  for  the  guards  of  operators. 
The  stability  of  a  guard  is  guaranteed  by  two  properties.  One  the  one 
hand,  the  compilation  method  sees  to  it  that  a  change  of  value  on  a  single 
wire  is  followed  by  a  change  of  value  of  the  output  variable  of  the  operator 
the  wire  is  an  input  of.  A  change  of  value  on  a  fork  is  followed  by  a  change 
of  value  of  the  output  variable  of  at  least  one  of  the  operators  the  fork  is 
an  input  of.  Since  we  assume  the  forks  to  be  isochronic,  this  is  enough  to 
guarantee  that  the  change  has  reached  all  outputs  of  the  fork  before  a  new 
change  occurs.  On  the  other  hand,  we  may  assume  that  a  change  of  value-a 
change  of  voltage — on  a  VLSI  wire  is  monotonic.  The  combination  of  these 
two  properties  guarantees  the  stability  of  the  guards. 

Often,  the  isochronicity  of  the  forks  is  not  necessary.  When  it  is,  it  is 
enough  to  ensure,  for  a  binary  fork,  that  the  delay  in  a  branch  of  the  fork 
is  shorter  than  the  delay  in  the  gate  to  which  the  branch  is  not  connected. 
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